Bag of Words Involving Selection of Term Sets Using Apriori Algorithm in Text Classification

نویسنده

  • R. Iswarya
چکیده

Text mining area enables people to extract the relevant information from the given text. This approach is advantageous, because it allows the users to retrieve the useful information and helps to avoid unambiguity in the text. Text Mining tasks are classified as text classification, text clustering and summarization of documents. Phrases are important in the field of text mining and information retrieval. Phrases identification, classification of phrases constitute of a major importance because, phrases are constructed by means of term sets. Term sets or item sets constitute a set of two terms usually bigram. The occurrence of a term in the document can be found by means of providing binary weights. Document representation can be done with Bag of Words (BOW) model. The main motivation of this paper is that term sets are constructed by computing the frequency of each term in the corresponding document and weighting for a term is also provided. Terms sets involving adjacent pair and nonadjacent pair are taken into consideration and this is used for the classifying positive documents and negative documents. Association rule mining is used for the construction of term sets. News Group dataset was taken which consists of twenty thousand messages is one which is widely used. In this paper, first preprocessing was done by means of stop word removal and stemming. Finally, association rules are formed by using apriori algorithm and term sets are formed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

Designing Semantic Kernels as Implicit Superconcept Expansions

Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015